Language Model and Grammar Extraction Variation in Machine Translation
نویسندگان
چکیده
This paper describes the system we developed to improve German-English translation of News text for the shared task of the Fifth Workshop on Statistical Machine Translation. Working within cdec, an open source modular framework for machine translation, we explore the benefits of several modifications to our hierarchical phrase-based model, including segmentation lattices, minimum Bayes Risk decoding, grammar extraction methods, and varying language models. Furthermore, we analyze decoder speed and memory performance across our set of models and show there is an important trade-off that needs to be made.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملThe Relationship between EFL Learners’ Explicit Knowledge of Source Language and Their Translation Ability
The purpose of this study was to investigate the relationship between students‘ explicit knowledge in grammar and their translation ability. The importance of grammatical knowledge and its effectiveness in translation quality motivated the researcher to run this study and consider grammatical knowledge in Per- sian as the source language of Iranian students. It is clear that grammar is an area ...
متن کاملJoshua: An Open Source Toolkit for Parsing-Based Machine Translation
We describe Joshua, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for synchronous context free grammars (SCFGs): chart-parsing, ngram language model integration, beamand cube-pruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. It uses parallel and distributed c...
متن کاملKriya - An end-to-end Hierarchical Phrase-based MT System
This paper describes Kriya – a new statistical machine translation (SMT) system that uses hierarchical phrases, whichwere first introduced in the Hieromachine translation system (Chiang, 2007). Kriya supports both a grammar extraction module for synchronous context-free grammars (SCFGs) and a CKY-based decoder. There are several re-implementations of Hiero in the machine translation community, ...
متن کاملAkamon: An Open Source Toolkit for Tree/Forest-Based Statistical Machine Translation
We describe Akamon, an open source toolkit for tree and forest-based statistical machine translation (Liu et al., 2006; Mi et al., 2008; Mi and Huang, 2008). Akamon implements all of the algorithms required for tree/forestto-string decoding using tree-to-string translation rules: multiple-thread forest-based decoding, n-gram language model integration, beamand cube-pruning, k-best hypotheses ex...
متن کامل